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« 

An overview of contemporary measurement practices mth a focus on emergent 
trends and implications for research workers and instructional personnel. Also 
attempted is a selective critical commentarj-^ based upon empirical research . 
related to measurement technique. 
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Measurement Practices in Hlarly Childhpod Education 

IllTRODUCTION ^ 

There can be little question that generalizations about child development 
and behavior, teacher effectiveness, and the worth of educational programs are 
no better than the data froin which they come. Such data are in turn a function 
•nf the strategies and techniques used to measure behavior. In this ^£fMBt 
current measurement strategies and techniques applicable to early childhood 
education are selectively examined. This examination includes a look at the 
basic measurement problems faced by both teacher* educators and teachers of 
young children. Special emphasis is placed on how to measure behavior. 

The task of writing this chapter was approached vrlth all the verve of a 
Coronadc searching for the Seven Cities of Gold. It wa?o hoped that somehow 
the spadework for this task would uncover a wealth of innovative measurement" 
techniques, especially techniques that reflect advances over the many shopworn 
procedures that have so long dominated educational practice. Like Coronado, 
and not surprisingly, hopes were frustrated.' In this writer's judgment, few 
if any genuine breakthroughs in behavioral measurement relevant for practitioners 
have occurred in recent years. To be sure, there have emerged many variations 
on the trafditional themes of testing and observation, Moreover, a staggering 
proliferation of new, but conventional, measuring instruments has occurred. 
This is notably true for measures of preschool children's language development 
and pre-academic skilis. But innovatory measurement techniques that are both 
valid and practical for widespread use in the field are indeed few and far 
between. 

This view is shared by others who have recently dealt with the role of 
measurement in the evaluation of early childhood programs. For example, Kamii 
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and Rlliot (19?1) have called for the development of measiireir.ent techniques 
better to match the program objectives of new curricula for young children. 
These authorities are especially critical of the use of conventional, standard- 
ized tests of intelligence, visual perception, vocabulary, and psycholinguistic 
abilities for the suiranative evaluation of early childhood programs. The main 
reason for such criticism is that such instruments simply are not designed for 
this purpose^ Rather, instruments of this kind are usually constructed to 
classil'y children, diagnose possible learning disorders, or predict subsequent 
learning and development. 

This problem applies also to procedures for comparing various instructional 
programs for young children. For example, Butler (1970) has remarked during a 
review of research about early intervention programs: 

"Instrumentation is a particular problem, i-^hat kind of 
instrumentation is valid if one wishes to compare the outcomes of 
a cognitive, direct-instruction program with a much more broadly 
based, informally organised program? What can changes in IQ indi- 
cate about the outcomes of these programs when other aspects are"' 
not measured?" (Butler, 1970, p. 18), 

Some readers will sureH^jr quarrel mth the pessimistic tone of these con- 
clusions. Therefore, possible exceptions to the general situation should be 
mentioned. For example, considerable advances in the technique of computer- 
assisted branched 'testing have been made (Holtz;man, 1971) • However, the appli- 
cation of this technique depends upon an elaborate and expensive set^ of hardware 
and technical, know-how seldom found in school settings. Perhaps even more 
exciting is the potential for unobtrusive, or inconspicuous, measures in various 
early childhood education endeavors ('/febli et al, I966). To date, however, 
this potential has not yet been fully explored. 



Trends in Measui-^ement Practice 

Lest the foregoing be taken as a too discouraging perception of the measure 
ment field, several encouraging trends in measurement practice can be noted. ''^ 
These trends largely involve developments in the measurement of children's 
behavior, although a few concern more directly certain other variables such as 
curriculum components and institutional change. Consider first those trends 
specific to the measurement of young children's behavior. 

The T'easurement o f Young Children's Behavior 

Under this topic, at least seven trends can -be identified. First, the 
range of measures available for use with yoxmg children has increased rapidly 
in the past several years. For example, no longer is a practitioner concerned 
with preschoolers limited to the use of intelligence scales, developmental 
"schedules," and highly experimental measures of learning ability. (See Appen- 
dix for a listing of recently published tests and scales designed for use with 
infants, preschoolers, and early school age children. ) Especially notable is 
the move toward comprehensive assessment of children's language development, 
until recently, and except for vocabulary development, an area sadly neglected 
in many early childhood education programs. See Cazden (19?l) for an overview 
of procedures for measuring young children's language development. 

Second, and related to the first, is a growing concern for the measurement 
of children's affect, including motives, attitudes, and self-esteem. This trend 
is reflected in several ways, including the widespread belief among many early 
childhood educators that the cultivation of a child's affective life if:? as 
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Important, If not more so, than is the cultivation of his Intellect. An Increasing 
number of early education programs include explicit goals for affective development* 
Thus there has emerged a need for suitable measures of this aspect of children's 
■)ehavior. Research about constructs such as in te 1 lec tua 1 ach ievemen t responsibility 
(Crandall, Katkovsky, and Crandall, 1965), achievement motivation (Heckhausen, 
1967), and self esteem (Coopersmlth, 1967) exemplifies the dynamic interaction of 
affec^tive and cognitive developmental factors In children. 

A third trend is a growing awareness and respect among measurement specialists 
avid educators for assessing individual differences along cultural -Unguis tic lines. 
This trend, described elsewhere as a decrease in the ethnocentrlsm of psychological 
assessment (HoLtzman, 1971), Is perhaps most apparert in the measurem^n£_o^-=sctiolastic 
aptitude and language competeute. For example, some authorities (e.g., Baratz, 1$69) 
are pressing strongly for the construction and administration of tests in the 
dialect or native language of minority group children. Frequently, such authorities 
also recommend that test content be altered better to reflect the cultural back- 
ground of minorities. The educational advantages of such a move have yet to be 
fully explicated on empirical grounds. However, this approach stands in marked 
contrast to the practice of using for all ethnic minority children measures of 
language and academic skills valued by white, middle class adults (including psychol- 
ogists). The general Isuue of cultural bias in testing is considered more fully 
later In this chapter. ! 

Fourth, there currently appears to be less emphasis on the use of formal tests 
alone for measuring the behavior of children (and teachers), and a greater emphasis 
on the use of other techniques. Including systematic observation (McReynolds, 1968)* 
This is clearly reflected in the growing popularity of process observation procedures, 
such as Interkctlon analysis and micro teaching. It is also Indicated by the focus on 
children's products (e.g. , stories, scientific experiments, art work, and other 
creative outputs) by protagonists of "open education." These notions lare considered 
in more detail in the section of this chapter' concerned with teacher made tests. 
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A fifth trend concerns the development of systematic procedures for screen- 
ing preschool and kindergarten-entry children in order better to tailor their 
preacadomic and early academic experiences. These procedures, typically devel- 
oped at the local school district level in relation to specific programs, stand 
in contrast to the traditional, global "readiness" test approach or ipferfnaT^ 
teacher ratings of developmental status. Aside from the general purpose of be^^ter 
educational planning for school beginners, screening measurements often provide 
data for the early detection of learning and behavior disorders. Examples of 
screening practices based upon the formal application of tests and scales include 
Ahr (1967), Conrad and Tobiessen (I96?), and Rea and Reys (1970), Rogolsky 
(1969) has provided a brief review of developments in this area. The value of 
early screening is also reflected in the creation of new tests of children's com- 
petence liith concepts deemed basic to early school success (e.g., BoeLm, I969; 
F.'oss, 1970). 

Sixth, there has been a marked increase in infant assessment during the 
past decade, especially in relation to infant stimulation studies ( e.g. , .Painter, 
1968; VJhite, 1971). This interest indicates both a renewed concern for the 
diagnosis of early developmental status and a bias in developmental theory 
regarding the importance of early experiences for overall development (Stedman, 
1966). Instruments for infant assessment are reviewed by Thomas (I97O), These 
are largely concerned with early cognitive and ^sensory-motor behaviors. Prac- 
tically no useful measures of early affective development (birth-age 3) are 
reported in the recent literature about infancy. 

Finally, a growing number of resource books and services relevant to mea- 
surement in childhood education are appearing. Hopefully, this means that edu- 
cators are becoming more aware both of the need for and value of judicious 
measurement practices in their work with young children. Examples include 
Beatty (1969), Bloom 



measurement practices in their work x-rith young childrent E:ramples include 
Beatty (1969), Bloom et al (1971), Goolsby and Darby (1969), Hess et al (196?), 
Jenkins et al (1966), Johnson and Bommarito (1971), McReynolds (1968), Palmer 
(1970), and Savage (I96B). To these resources can be added information services 
such as the Test Collection Bulletin Published regularly by Educational Testing 
^.rvice. Interested readers are encouraged to consult the aforementioned refer- 
ences and subscribe to the ETS Bulletin. In addition, a publication devoted to 
the description and evaluation of tests keyed to the objectives of elementary 
school education (Grades 1-6) is currently available (CSE, 1970). 

Other Developments in Current I';easurement Practice 

In addition to the foregoing trends, at least two other developments in 
current early education measurement practice can be cited. The first of these 
involves the search for measurements of program variables other than those 
expressed solely in terms of pupil or teacher behavior. Three examples can 
serve to illustrate this trend. First, a technique has been developed for 
assessing the organisation of physical space within i^hich early education occurs 
(Kritchevsky and Prescott, 1969). This seems particularly useful in view of the 
apparently significant, but often overlooked,' relationship between spatial organ- 
ization and the classroom-playground behavior of children and their teachers 
(Prescott and Jones, 196?). Related to this technique are still more comprehen- 
sive attempts to measure educational environments, some of which can be adapted 
to the ^-oncems of teacher educators (e.g., Astin and Holland, I96I; Creager and 
Astin, 1965)1 and measurement guidelines for the evaluaMon of a total school 
^stem (KGS, 196^). 

Another example of the aforementioned development is the conceptualization 
of measurement criteria for purposes of evaluating instructional materials and 



equipment, Notable among specific developments along this avenue of measurement 
are procedures designed to measure the reading difficulty or "readability'* of 
witten materials (Bormuth,-- I968; Klare^ 1963)« Other more preliminary efforts 
in this direction appear promising (e.g., Dick, I968; Eash, 19^9) although not 
yet- widely applied in the specific context of early childhood education, 

A final example invblves the analysis of measurable curriculum dimensions- 
(e«g#, pacing, variety, sequencing, and scope) upon which early childhood educa- 
tion programs can be compared (Lay and Dopyera, 1971)., The application of this 
meastirement concept' -to early education research is but in the embryonic sta^e. 
However, it seems especially suitable for objectified assessments of diverse 
instructional programs, 

A second, broad development in the measurement of variables other than 
cla-s^room behavior is represented by attempts to measure institutional change. 
This is notably the case within comrminities that are served by programs such as 
Project Head Start (Kirschner Associates-, 1970), Among the quantitative insti- 
tutional variables amenable to measurement include involvement of the poor in 
community lecision-making activities, employment of local residents in parapro- 
fessional occupations, and allocation of resources to the educational and 
health needs of poverty and minority groups. This approach to the measurement 
of change is significant if for no other reason that it encourages one to assess 
changes that extend beyond immediate pupil outcomes to possible broad scale 
social benefits of early education programs* 

Functions of Measurement Early Childhood Education 

l^aluation^pervades virtually every aspect 
of early childhood education, including the preparation of teacher i3i&rsonnel, 
Cf evaluation is to be based on data, mea^rements of one sort or another are 
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necessary at various points in any program* whether its focus be personnel 
training or children and their parents. Points for measurement are both fre- 
quent and critical. For example, important measurement functions at, the pre- 
ser^T^ice level of teacher education often include^selection of trainees, diag- 
nosis of trainee needs,*' measuring trainee progress and training outcomes, and 
predicting in-service teaching success. At the latter level of training, mea- 
surement again becomes central in matters of selecting personnel and determin- 
ing teacher effectiveness. 

For children involved in early education programs, measurement is equally 
important. If the readiness principle is to be anything but a sterile cliche, 
children's -entering behavior along multiple dimensions must be assessed and 
the data then used to; facilitate individualized instruction where necessary, 
Kany educators are interested in charting the developmental progress of children 
apart from specific curricular experiences and, of course, t.he degree of chil- 
dren's progress in relation to explicit curriculum objectives must be measured 
in some way, 

Meas\irement is also crucial for parents variously involved in early educa- 
tion programs. Increased concern is being shown for measuring the quality and 
extent of parent involvement in early education, the outcomes of attempts to 
provide parent education in matters of child development and family relations, 
and parent satisfaction with their children's participation and progress in 
given early education programs, -v. 

BASIC PROBLEMS IN lEASURErErT 

Most simply, measurement is the description of data in terms of numbers , 
(Guilford, 195^). More specifically, measurement involves the assignment of 
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numerals to objects or events according to certain rules in order to represent 
magnitude (Stevens, 1951). Occasionally, one is interested in measurement only 
to determine the presence or absence of some property, xn^thout further quantifi- 
cation in ter^!S of "more" or "less" (English and English, 1958). In this JS&B 



like are not considered. Interested readers may review such topics elsewhere 
(e.g., Stevens, 1951; Nunnally, 19^^). It is important, however, to point out 
that any use of measurement in educational settings involves ".L least three 
assumptions: The behavior of children and teachers car oe symboliaed numeri- 
cally, the numerical description of behavior ca^ be analyzed according to cer- 
tain mathematical principles, and the resv.lts of such analyses can serve as 
useful and valid indications of the oehavior involved (DuBois and I'ayo, 1970). 
Once these assumptions are accepted, measurement can proceed. But at least 
three basic problems must be solved by anyone concerned with measuring behavior 
in an educational program for children or teachers. These are the problems of 
what, how, and when to measure (Webb, 19?0), 

The What of l^e? irement 

The effectiveness of behavioral measurement in training programs is contin- 
gent upon the precision with which training outcomes are specified. That is, 
until one decides exactly what it is that a child (or teacher) should be doing 
differently as a result of a training experience, it is unlikely that measure- 



level of the measurement problem in practice. First, objectives frequently bor- 
der on the intangible I making difficult any consensus about what constitutes 



the complexities of rules for assigning numbers, scaling procedures, and the 



ment i-Jill be useful for one's intended 
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evidence of the desired behavior* Consider, for example, the ambiguities involved 
In such kindergarten objectives as '^Responsiveness to beauty in all f ortnsf' or 
"Realization of individuality and creative propensities" (Headley, 1965). Second, 
and related to the first, is the frequent tendency to determine whet to measure 
on the basis of expediency or convenience. That is, instead of gearing one's 
measurement policy to relevant program objectives, one opts fo'F measuring what can 
easily or readily be measured. In the^ extreme case, one may ^mBI' entirely by not 
measuring at all on the grounds that suitable techniques are not available, or that 
the "really important goals" are long term, and therefore measurement at this time 
Is inapproptiate. ^ _ •' 

A measurement policy designed exclusively aroun^'\^rogram objectives may, of 
course, be too delimiting* The broader guideline for determining what to tneasur^^ 
concerns any information that is either necessary or useftdT^in (1) making decisions 
about programs and their participants; (2) reporting to outside agencies, parents, 
and fellow professionals; and (3) charting developmental changes for record-keeping 
purposes (e.g., leight and weight in young children). Hopefully, a measurement 
policy is never based solely on custom or simply because it is the "thing to^do." 
The How of Measurement 

Once decisions about what to measure have been made, one is faced with the two- 
pronged measurement techniques problem; (1) determining the units of measurement that 
are mostgpertinent to tasks for which an individual is being groomed, and (2) selecting 
or developing a technique which will yeld valid and reliable me^iSurementa (Webb^ 
1970). Commonly used measurement units range from speedy amount, frequency, and 
accuracy to variety, quality, persistence and originality. For example,! a prospective 
teacher being trained in the successful application of classroom 
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tnana^^ement techniques may be required to demonstrate that she is capable of 
handling a child's aggression quickly (speed) in several different ways (var- 
iety) that are based upon valid management principles (accuracy) • Similarly, 
in the case of -i child being schooled in the techniques of creative problem 
solving techniques,, both persistence and originality, among others, are rele- 
vant units of measurement. l-Ieasuroment of the acquisition of factual knowledge 
or extent of comprehension of concepts and principles obviously calls for atten- 
tion to both accuracy and amount. The point is that the unit(s) to be used 
depend upon the components of behavior that are focussed on in-training. 
Again, this requires a careful analysis of the behavioral components reflected 
in program objectives. 

The second prong of the problem about how to measure concerns specifically 
the matter of measurement strategy and technique. By measurement strategy is 
meant the method for determining the referents against which an individual's 
behavior can be measured. Keasurement technique refers to the particular pro- 
cedure for describing the behavior, usually in quantitative terms. 

Basic Measurement Strateg:ies 

Perhaps the most basic distinction in measurement strategy is that between 
norm-referenced and criterion-referenced measurement (Glaser, I963) (Popham and 
Husek, 1969). A norm-referenced measure is one in which the meaning of an indi- 
vidual's behavior is derived from the behavior of others on the same measure. 
In other words, a comparison of persons whot^e behavior is measured by the same 
device is usually necessary for an interpretation of the behavior. The widely 
^s®^ Preschool Inventory (Caldwell, 196?) is an example of this approach to 
measurement. It is based upon the assumption that individual differences in 
intellectual attainments exist among children ages 4i to 6i. A child's 
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performance on this test is interpreted by comparing him with other children of 
the same age and socioeconomic standing. Thus, a child's performance can be 
described as '^average," "above average,*' or ''below average" in relation to how 
the scores of his comparison group are distributed. The distribution of test 
scores or other quantitative data constitutes the basis for test norms: Hence 
the term norm-referenced measurement. Most standardized measures of intelligence, 
academic achievement, and even "personality" are norm-referenced measures. 

In contrast, criterion-referenced measurement Involves determining an indi- 
vidual's status in relation to some preselected or established standard of per- 
formance. This standard (or criterion) not other individuals becomes the 
item against which performance is measured and interpreted. Performance tests 
such as' those Involved in^obtaining a driver's license and demonstrating swimming 
proficiency are examples. Minimal, although absolute standards of competence 
must be demonstrated in order to "pass." Insofar as a given individual is 
concerned, the performance of others on the same measures is irrelevant. 

In early childhood education circles, the Basic Concept Inventory (Engelmann, 
1967) is an example of a criterion-referenced measure. This measure is based on 
the assumption that. certain basic conceptual skills are critical for successful 
early academic progress. It can therefore be used to measure which of these various 
skills the child has or has not mastered so that remedial Instruction can be pro- 
grammed. Or, it can be used to measure the effectiveness of an instructional program 
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designed to develop mastery of these skills among young children. 

Among the most recent and comprehensive applications of criterion-referenced 
achievement measures is represented by the Individually Prescribed Ins true t ion (IPI) 
evaluation program (Lindvall and Cox, 1970), Developed at the University of Pitts- 
burgh, this program includes four main components: (1)' tests for the initial place- 
ment of pupils in the instructional program, (2) pretests in relation to specific 
curriculum unit objectives, (3) curriculum- embedded tests to measure individual 
pupil progress, and (4) curriculum unit post- tests for summatlve evaluation. 
Additionally, non-test information, Indluding data obtained during personalized 
pupil-teacher conferences, are used to facilitate the design of individualized in- 
struction and its evaluation. 

The relative merits or weaknesses of criterion- and norm- referenced measurement 
strategies are perhaps incidental to the basis or lationfale for choosing one or the 
othv^r for use in the 
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practical setting,^ In both cases, this choice of strategy is contingent upon 
the kinds of decisions one will make from the measures obtained (Garvin, 1970) • 
Some educational decisions involve the selection of a "fixed quota" from either 
the high or low end of a distribution of scores. For example, teacher trainers 
may x-nsh to admit only those candidates for training whose scores on measures 
of academic competence and attitude toward children fall in the upper quartile. 
Or, one may have room, for a small number of children in a compensatory education 
program and select for the special treatment only those who score at some point 
"below average," In both of these examples, norm-referenced measures would be 
appropriate. In addition^ where information about the capacity of a given 
instructional program to increase the range of individual differences is sought, 
norm-referenced measvirement is also appropriate. 

If, on the other hand, one's decisions are primarily oriented toward cer- 
tifying competence with respect to some a priori standard, then criterion-ref- 
erenced measurement is clearly indicated. Training programs inhere objectives 
are behaviorally defined are those in which criterion-referenced meaSrirement is 
natural. In such cases, one is usually most concerned with whether (or what 
proportion of) students master a given objective, not how they compare to some 
Aorm group (Sjogren^ 1970). 

It should also be noted that norm-referenced measures are often used as if 
they were measures of the criterion-reference type. A case in point, is the use 
of a conventional intelligence test to "evaluate" the effectiveness of an early 
intervention program after children have been taught the test items directly. 
Yet norm-referenced measures typically are designed to "spread out" individuals 

2 For a discussion of the merits and limitations of criterion-referenced measure- 
ment see Ebel (1970). 



along some dimension of behavior, and they usually represent only very broad 
samj>les of such behavior at that. Rarely can one find a norm-referenced measure 
that reflects in specific ways the objectives of most training programs. This 
point ^xdll again be considered, but in a different light, later in the chapter. 

Finally, and in relation especiilly to criterion-referenced testing, 
increased attention is now being given to the formulation of various rate measures 
of learning. By this is meant measurements based on time n'^^eded by learners to 
achieve specified goals (Carroll, 1970). A consideration of rate measures involves 
a number of interacting variables — motivation (perserverance) of the learner, 
opportunity to learn, quality of instruction, and learner ability to comprehend 
and profit from instruction. These variables pose substantial measurement prob- 
lems in themselves. But, essentially, this approach concerns measuring learning 
rate in terms of a ratio between amount of knowledge or skill gained and a speci- 
fied unit of time. Interested readers should consult Carroll (1970) for details. 

l^leasurement Techniques 

Measurement techniques can be classified in many different ways: According 
to the content or area measured (e.g., intelligence, interest, motor skill), the 
way in x^hich a measure is administered (e.g., group versus individual), response 

mode (e.g., paper and pencil, free versus controlled response, verbal or. non- 

verbal response), scoring method (e.g., subjective and judgemental versus "objec- 
tive")! target population (e.g., infants, preschoolers, or ^.eachers), and format 
of the measure (e.g., rating scale, performance test, process observation). It 
is convenient, however, to conceptualize measurement techniques along a broad 
dimension that transcends the foregoing classification schemes, namely, obtrusive- 
unobtrusive' measurements (Shalock, 1968). By definition,' an obtrusive measure is 
one in which an examiner or observer is present on the scene and the examinee or 
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observes is aware' that his behavior is being scrutinized. In general, ^obtinisive 
measures represent the other side of this coin: Physical traces (erosion and 
accretion), running records, episodic and private records, and the like. To 
date these measures have been little used in education, although they are poten- 
tially valuable, especially in combination with appropriate obtrusive measures 
(see Webb et al, I966). The present discussion generally is limited to obtrusive 
measures. Hox^Jever, it should be recognized that many observational procedures 
(both s/mple and contrived) herein discussed are essentially unobtrusive, More- 
over, it is even possible that tests, to the extent they become simply another 
part of classroom routine, may take on the characteristics of an'-onobtrusive'mea- 
sxxre • 

Obtrusive measures can be grouped into at least five broad categories: 
IntervieTATS , systematic observatioUj standardized objective measures, standard 
projective measures, and teacher-made tests • This writer makes the assumption 
that readers are sufficiently familiar with these classes of measures (including 
their principal strengths and weaknesses) that extended descriptions of them are 
unnecessary. Therefore, only very general comments concerning such classes T-d-U 
be advanced. The thrust instead will be in the direction of highlighting those 
measures that are currently being profitably used in the field or that seem to 
hold more than average promise for use by practitioners. This survey is not 
limited to measures of children's behavior. Attention is also given to examples 
of measures of teacher and parent behavior, ; 

Interviews , Very few reports involving the use of interview measurements 
-with children in early childhood education programs exist in the current litera- 
ture. The clearest exception to this is the liberal use of Piaget's methode clin- 
Iq^e in curricula based on cognitive developmental theory (Kamii, 1971; Lavatelli, 
1970). Such exploratory interviews are serai-structured in that they are designed 
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within broad limits, to determine the child's understanding of observed phenomena 
related to logical classification, seriation, numerical construction, conserva- 
tion, and spatial concepts. According to Kamii and Peper (I969) the methode 
clinique differs from psychometric methods thuslys 

"In the psychometric method, the examiner is required to 
follow a standard set of procedures specified in the manual, with- 
out any deviation. The wording of a question cannot be changed, 
and the number of times the instruction can be repeated is speci- 
fied. In the 'exploratory method,' on the other hand, the examiner 
has an outline and a hypothesis in mind at all times, and he tests 
these hypotheses by following the child's train of thought in a 
natural, conversational way. The examiner uses his ingenuity to make 
himself understood by the child in any way possible," (Kamii and 
Peper, I969, p. 13). 

For educators interested in the child's conceptualization of Piaget-based 
tasks, this technique has the potential of yielding information not accessible 
in any other way. However, this comment is based on the dual assumption that 
an examiner will execute the methode clinique correctly and will not be 
deceived by the child's language or his oxm biases. 

Aside from this application, only a scattering of reports of formal inter- 
view measurement is apparent in the literature. Perhaps the most novel of these 
reports concerns the development of a standardized telephone interview procedure 
for obtaininf^ speech samples from young children. Especially promising results 
from the yise of this technique with disadvantaged children have been reported 
(IDS, 1968), 

Interview mea'surement can also be helpful in the study of teacher behavior. 
Currently, however, the interview is largely restricted to two program aspects: 
selection and program evaluation. The interview has long been a popular proce- 
d.ure for selecting both candidates for teacher training and hiring teachers for 
existing school programs. One is forced to look long and hard, however, for 
evidence to indicate that interview data alone predict success in either venture. 
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Undoubtedly this is due in large part to the difficulty of finding anjrthing in 
common between interview measurements and in-process teacher behavior. Periodic 
interviews with teachers-in-training can be useful SiS a feedback mechanism for 
training program effectiveness, although surprisingly few examples of this can 

9 

be found in the current teacher education literature. 

Perhaps the most extensive use^'of interivewing has be^n in the study of 
parental child rearing practices. For example, this method has been used profit 
ab]y in recent years to measure parents^ beliefs and perceptions about themselves 
as parents, including such things as child rearing philosophy and preferred dis- 
ciplinary practices (Ba^^umrind, 1968 )• Within the context of early education 
programs, elaborate and promising interview methods have also been included in 
broad scale evaluations of Head Start and Project Follow Through such as those 
executed by agencies like Stanford Research Institute and the Educational Test- 
ing Service. 

Systematic Observation . Observational techniques need not always be obtru- 
sive. For example, observation conducted through one-way mirrors or by way of 
video tape r»'Cording for later analysis is unlikely to distract or othea^wise 
affect the bahavior of children or teachers being observed. However, outside 
of the psychological laboratory or child-research nursery school, such devices 
are rarely available (or used). Observation more typically occurs in the pre- 
sence of children and teachers* Even then systematic, direct observation : 
methods are problematical, especially with respect to reliability and the con- 
trol of situational variables. Nevertheless, it is this writer's opinion that 
some of the most promising recent advances in measurement technique have been 
made in the area of systematic observation. For example, techniques of obser- 
vation from the study of operant conditioning offer a great deal to persons . 
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concerned with measuring the effects of cueing procedures and reinforcement con- 
tingencies on response rate or frequency (See, for example, Honig, 1966; Baer, 
Wolf, and Risley, 1968; and Weiss, 1968), A major work on observation couched 
in the science of ethology is also a must for students of obsei^ational technique 
(Hutt and Hutt, 1970), 

J'^ethods for recording data obtained through observation include diary records, 
checklists, rating scales, rate and frequency counts, and anecdotal records. 
Details of these methods are given in many sources (e.g., Adams, 1964; Furth, 
1958; Payne, 1968; Stanley, 1964), which should be consulted by readers unfam- 
iliar with such methods. For the present, some basic features of observational 
sj^stems suitable for use in early childhood education will be mentioned. 

Traditionally, systems for observing yoimg children have been focused on 
the individual child, his social and problem-solving skills, play activities 
and interations with materials in the classroom (Simon and Boyer, 1970 )', An 
example of a recently-developed system for. individual child observation is the 
Personal Record of School Experience — PROSE — (Medley, I969). This system 
involves no rating, but simply the objective recording of observable events as 
they occur. One child at a time is observed and all of his activity is recorded 
by means of a manageable coding system based on 11 categories of behavior (e.g., 
level of attention, manifest affect, and physical activity). Static conditions 
such as class organization, subject rnatter, and instructional materials in use 
can also be recorded. Codified data may be Computer-analyzed. The PROSE is 
based upon the principle of OsCAR, a widely known system for observing teacher- 
pupil interaction. (Medley 1963). 

Other examples of new observational schemes include systems for assessing 
preschool classroom environments (Stern and Gordon, I967), nine aspects of young 
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children's classroom behavior (Katz, 1968), and soci'al behavior in natoal set- 
tings (Honig et al, 1970; Cunningham and Boger, 1971). The Stern and Gordon (I967) 
inventory of checklists and scales is notable for its comprehensiveness, ^Cate- 
gories for measurement include (1) physical. environment, materials and equipment, 
(2) program structure, balance, and organization, (3) play activities, (^) pr'^- 
dominant teaching mode, (5) role of the teacher regai^ding verbal and nonverbal 
communication, (6) group control and management, (7) teacher involvement in chil- 
dren's social relations, (8) classroom atmosphere, (9) teacher ''style and tone,'* 
and (10) general aspects of the teacher's relationship xd.th children. 

Systems for observing the relative strengths and abilities of teachers con- 
tinue to be developed. A good example is Brown's (1970) Teacher Practices Obser- 
vation Record . This instrument is being used in several Project Follow Through 
settings as an aid in developing effective behavior in both teachers and teacher 
aides. However, the most extensive developments in classroom observational tech- 
nique are based on the concept of Afteraction analysis . The impetus for such 
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developments has been largely provided by the Flanders* Interaction Analysis 
Technique (Flanders, I966). The application of this technique requires that an 
observer keep a running record of teacher-pupil exchanges in three-second inter- 
vals. These exchanges can be tabulated according to categories of behavior that 
range from direct (e.g., giving commands, lecturing, justifying authority) to 
indirect teacher influences (e.g., accepting pupils* ideas and feelings, praising). 
Provision for recording extent of pupil talk is also made. Resulting data can be 
used to analyze prevailing patterns of teacher-student interaction and the rela- 
tionship of such patterns to pupil achievement and attitudes. 

Much research based on the Flanders system has accumulated in the past 
decade. For recent reviews, see Nuthall (1970) and Garrard (I966). Thus, if 
nothing else, this approach has been of great heuristic value. However, the 
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technique also has immense practical value for guiding teacher behavior, i|Lld 
criticism has been levelled at the Flanders system because of (1) its exclusive 
focus on verbal classroom behavior (especially of the teacher) and (2) it.^3 pre- 
dominant concern -with affective components of classroom behavior. But several 
extensions of this technique have been made which merit consideration by those 
involved in measuring and evaluating teacher behavior. These extensions include 
greater provision for cognitive factors (Amidon, 1366) (Reynolds et aJ^, 1971) 
and nonverbal classroom interaction (French and Galloway, 1968). Perhaps the 
single most important finding from interaction analysis research for early child- 
hood educators is that student teachers taught interaction-analysis are gener- 
ally more indirect, supporting, and accepting of their pupils than are student 
teachers unfamiliar xd.th this approach (Amidon, 1967). 

Space limitations do not permit an elaborate review of all exciting devel- 
opments in the area of classroom observation. However, a few additional systems 
^ deserve brief mention. These include systems designed especially for assessing 

student teachers (Sharpe, I969), teacher sld.ll in classroom management (Soar 
et al, 1971) t a process approach to teacher's question-asking behavior (.Zimmerman 
and Bergan, 1968), and the Behavioral Analysis Instrxment for Teachers (I969). 
The latter is particularly useful for describing teacher skill in pedagogical 
technique, curriculum planning, and pupil evaluation-diagnosis. Perhaps the 
most comprehensive of all the newer systems is the Classroom Obs^ervation Instru- 
ment (SRI, 1970) developed for use in evaluating Project Head Start and Follow 
Through. Finally, interested readers are encouraged to examine carefully Simon 
and Boyer's (1970) anthology of 79 classroom observation systems, and two major 
publications about the toLb of systematic observation for assessing and improving 
classroom behavior (Brown, 19^9; Gallagher, Nuthall, and Rosenshine, 1970). 

ERIC 
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There has been no extensive use of systematic observation techniques for 
measuring parental behavior in connection with early childhood education. There 
are at least two reasons for this. First, it is only recently that educators 
have become sensitive to the role of parents in the formal early education 
enterprise. Second, it is extremely difficult to arrange for such observation, 
either in homes or schools. It is therefore much more common for parental 
behavior to be measured by interviews, questionnaires, and checklists. However, 
the potential of observing parent-child interaction to measure such things as 
parental teaching styles, the quality of parent-child relationships, and home 
stimulation cannot be overlooked. The value of such an approach is well illus- 
trated by the work of Hess and Shipman (1965), Bee et al (I969), Brophy (I97O), 
Schmidt and Hore (1970), and STIb: (I969). Observational procedures have also 
been utilized in evaluating maternal inservice ^:raining associated with pre- 
school intervention programs (Hamilton, 1971). 

A word about rating scale methods for tabulating observational data is 
also in order. It is clear from the literature that ratings of teachers by 
supervisors and of children by teachers continue to be popular and expedient 
means for measuring classroom behavior. For example, a recent survey of 53 of 
the nation's 60 largest school districts revealed that 50 of these districts 
currently use some type of rating scale to measure teacher performance (Queer, 
1969). Problems and procedures associated x^dth these and other techniques for 
measuring faculty instructional effectiveness are discussed by Blair (I968) 
and Cohen and Rrawer (1969). Concerning effectiveness among teacher educators, 
it is not surprising that primary factors in "good" instruction include (1) 
coursework in which objectives are clearly defined, (2) a classroom atmosphere 
conductive to student ease, and (3) a tolerant, responsive instructor who 
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demonstrates both competence and enthusiasm (Bannister, 1961) • Such character- 
istics would seem also to apply to teachers of young children. Research contin- 
ues to highlight the importance of qualities such as empathy, nurturance, and 
communication skill for early childhood educators. For cues concerning the 
measurement of such qualities see Hcgan (1969) and O'Leary and Becker (1969). 

The problems of reliability and validity inherent in rating seal© Approaohcs 
to measurement are well known. But the use of well designed scales for measur- 
ing children's behavior by experienced teachers is often beneficial. Two examples 
illustrate such benefit. First, a rating scale adapted from the face sheet of 
the Stanford-Binet I'nt^lligence Test form has been reported as being extremely 
useful in measuring three important motivational characteristics of children: 
Achievement motivation, confidence in ability, and activity level (Hess et al, 
1966), Second, promising rating scale devices for predicting kindergarten and 

primary grade academic achievement, learning difficulties, and behavior problems 

f 

have been reported (Attwell et al, I967) (Conrad and Tobiessen, I967) (Gross, 

1970) . 

Rating scale methods also figure heavily in efforts to measure "socializa- 
tion" of bilingual and ethnic group children (e,g,, CerVenka, 1968), teachers* 
estimates of social competency among preschool and elementary school children 
(Levine and Elzey, I968; Seagoe, I97O), and infant development (Hoopes, 1967). 

Standardized . Ob.jective Measures , This class of measures Includes instru- 
ments for the measurement of intelligence and aptitude, achievement, personality, 
attitude, and interest. Such measures commonly appear in the form of tests that 
are constructed, administered, and scored according to prescribed rules (Broxm, 

1971) . These rules govern the selection of item content, instructions for giving 
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and taking^,•yle test, and recording and evaluating test responses. Strictly 
speaking, such tests are limited to tne measurement of behavior in the specific 
test situation . -a situation which is usually contrived. Consequently, any 
statements or conclusions about the person being tested represent an inference 
from that situation to the general class of responses presumably sampled by the 
test. That is, by way of inference one generalizes from sample of test behavior 
to the broader characteristic( s) of the individual. This is particularly true 
of norm-referenced measures. Finally, it should be noted that not all tests are 
limited to a paper-and-pencil format, nor do all tests require formal arrange- 
ments. In this sense, any measure of performance can be called a "test." How- 
ever, this section of the chapter is concerned largely vxith formal, obtrusive 
tests administered either individually or in groups. 

l/ith respect to standardized, objective measures of children's behavior, 
only a few points are made here. The reader is referred "^^^ fflHHHIHHB 
""or an annotated bibliography of commercially available tests and scales for 
use with children; and a handbook of such nilasures not commercially available 



has been published elsewhere (Johnson and Bommarito, 1971) # First, conventional 
measures of mental ability continue to be used extensively for purposes of diag- 
nosing developmental status, guidance, and measuring the effects of therapy 
(Stott and Ball, 1965). By far the most frequently used measure of young chil- 
dren's intelligence is the Stanford-Binet . Other widely used measures are 
Goodenough's Draw-a-l%n, the VJeschler Intelligence Scales for preprimary and 
school-age children, the Gesell Schedules, the Cattell Infant Scale, Ammons 
Picture-Vocabulary, and the ^-errill-Palmer Scale. For complete reviews of 
these and other conventional scales see Stott and Ball (1965). 

Second, alternatives to conventional measures of intelligence are finding 



favor among many psychologists and educators (Achenbach, 1970) • This is parti- 
cularly true for those who have been attracted to Piaget's cognitive-develop- 
mental theory of mental development. Piaget-based scales for use as early as 
infancy have been developed (e.g., Uzgiris and Hunt, 1969; Honig and Lally, 1970). 
Other scales for the measurement of precausal thinking, object permanence, 
classificatory development, and conservation have been devised (Laurendeau and 
t^inard, 1962; Decarie, 1965 s Kofsky, 1966; Goldschmid and Eentler, 1968). A 
critical examination of this developmental approach to the measurement of cogni- 
tion and its implication^ lor practitioners has been provided by Sullivan (1967)« 

Third, there has been a tremendous surge of interest in the measurement of 
children's language competence since the advent of federal compensatory educa- 
tion programs. Kewly developed language measures are appearing regularly in 
the literature, many of which are used to measure the outcomes of language train- 
ing programs for disadvantaged and minority group children (e.g., Bierly, 1971; 



Kehrabian, 1970; Stern and Gupta, 1970). Further, research workers are beginV 
ning to explore how standardized testing procedures may be altered better to 
assess the language skills of disadvantaged preschoolers. For example, a modi- 
fied Peabody Picture Vocabulary T 1 1|(B\(>lf1#l>l|lt has been devised whereby 
three important variables — expectancy for success, reinforcement, and speci- 
ficity of task instructions — are accounted for in the test administration 
(Ali and Costello, 1971). The net effect of this modification has been positive 
in terms of enhanced test scores for preschool children who otherwise may 
respond less well undei- "conventionally standard" conditions. 

Fourth, the influence of humanistic psychologies is apparent in the now 
widespread concern for children's affective development among early childhood 
educators. Unfortunately, the validity and other technical features of most 
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zneasarc;j of children's affect are uniinpres?-^.ve, if not poor. According to 
Hoepfner (1970), few worthwhile measures of achievement motivation, interest, 
activity level, and self-esteem are available. Paradoxically, these are among 
the phenomena c.bout which some educators are most concerned. In this vjriter's 
opinion, however, genuine attempts to develop better measures in these areas 
are becoming both more frequent and fruitful (a,g,, Adkins, 1968; Bolea, 1970; 
and Scares and Scares, I969), 

Not surprisingly, a majority of these attempts have focused on self-con- 
cept measures. An annotated bibliography of currently available measures of 
this construct designed for use xd,th young children can be obtained through 
the ERIC Clearinghouse on Early Childhood Education (Coller, 1970), Unfortun- 
ately, most of these measures ^re marked by serious limitations: They invite 
socially desirable responses, depend heavily on young children's verbal facil- 
ity, and utilize terminology the meaning of which is subject to wide differ- 
ences in interpretation. In view of such limitations, some research workers 
(e.g.. Long and Henderson, 1970; Yeatts and Bentley^ 1971) have experimented 
with a non-verbal approach to self-esteem with modestly encouraging results. 
Other pertinent resources relevant to measurement in the affective domain are 
Beatty (1969), Bloom et al OSVlf Chapter 10), and Eiss and Harbeck (I969). 
The latter two sources in particular deal with the knotty problem of affective 
objectives. 

Finally, it- should be noted that, apart from experimental programs in 
early childhood education, the ^stematic use of standardized^ objective mea- 
sures by nursery and kindergarten teachers apparently is not extensive. For 
example, Goslin et al (I965) report little use of tests beyond reading readi- 
ness and individual intelligence tests -t the kindergarten level. Gross IQ 
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data are of little use to teachers faced with the complexities of educational 
planning (Wei sworth, I969). Even the results of reading readiness tests, when 
obtained, are not often put to good use (Goslin, 1967), These practices are due 
to at least two reasons. One is the limited number of educationally useful 
tests available to teachers in the past. Fortunately, this state of affairs 
is rapidly changing. Another reason may be that teachers of young children 
simpV not trained to use such measures, including their selection, admini- 
stration, and interpretation. In this writer" s judgment, training along these 
lines is important at both the pre- and in-service teaching levels. Such train- 
ing for both testing and systematic observation conceivably can promote greater 
teacher initiative, cooperation, and responsibility concerning classroom mea- 
surement practices. As teacher involvement increases it seems more likely that 
classroom measurements will be put appropriately to use. Certainly educators 
should not allow tests to be administered and interpreted by untrained personnel. 

'/Jhile the need for teacher skill in test selection, administration, and 
interpretation is critical, a precautionary word is in order. As Cai^oll (1970) 
has observed, standardized tests can be overused and too much reliance placed on 
their results. For Carroll (1970) the problem is twofold: First, a given stan- 
dardized test may not be sufficiently appropriate to the particular learning 
tasks in a local curriculum; and, second, the overall score or grade level index 
derived from standardized test performance may be inadequate for determining 
what specific skills have and have not been well acquired by a student. These 
limitations must be kept in mind by practitioners who elect to use standardized 
tests. 

Consider next the use of standardized, objective measures of teacher behav- 
ior. Like intejrview measures, tests and scales for the measurement of teacher 
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behavior are most frequently used to select and predict instructional effective- 
ness. Occasionally, they are used to assess the effects of teacher training. 
Rep;ardless, their use is more extensive at the pre-service rather than the in- 
^ service level. Published tests that have received more than occasional use 
incl\KJe the California Psychological Inventory (Gough, I968), hinnesota Test of 
Teacher Attitudes (Yee and Fruchter, 1971), The Teacher Preference Schedule 
(Storn and i asling, 1958), the 'Jatson-Glaser Test of Critical Thinking (Hatson 
and GljaLseVf 1952), and the Tennessee Self -Concept Scale (Fitts, I965). Still 
other measures of interest in the study of teacher behavior can be cited. One 
of more than average usefulness for prediction purposes is addressed to teacher's 
beliefs about learning and teaching and the effect of such beliefs on classroom 
atmosphere (Harvey et al, 1966). T'easures of teacher knowledge and ability to 
apply principles of good teaching in simulated problem situations have been 
developed (Popham, 196^) (hurray, I969); and a method of assessing teacher atti- 
tudes toward children's behavior problems is available (Tolor et al, 196?) . 

The value of measures such as these depends on the purpose for which they 
are being used. It appears that more mileage can be obtained by employing a 
systematic observational approach, especially if some indication of teaching 
effectiveness is sought, bore frequently than not, a low and positive insigni- 
ficant relationship is obtained between perforniance on paper-and-pencil tests 
and teacher behavior as perceived by disinterested classroom observers. The 
issue here concerns the degree of correspondence between observed skill and 
verbalized beliefs, attitudes, and professed knowledge about teaching. Among 
the more promising steps in the direction of measuring degree of correspondence 
between teacher intentions and actual practices has been taken by Steele (I969). 
The resultant technique appears useful in determining the extent to which an 
instructional treatment is stably executed. 

ERLC 
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The coTDmon validity problem of papei-and-pencil tests of teaching skill is 
in part responsible for the development of performance tests of teaching effec- 
tiveness. Micro- teaching is one example of a technique that can be used to 
obtain some measure of actual performance. Still other performance appiroaches 
to the measurement of teaching proficiency have been attempted (e.g., Popham, 1971; 
Movody and Bausell, 1971). However, such approaches have usually failed to differ- 
entiate experienced, formally trained teachers from inexperienced, non- teachers. 
Perhaps the performance tests are faulty, but it is possible that such results 
indicate the inadequate nature of many teacher education programs. 

Finally, standardized, objective measures of student opinion, attitudes toward 
instruction, and achievement are being increasingly used as indications of teaching 
effectiveness. However, this occurs mainly at the college level wherein preservice 
' "-^^"teachers rate or otherwise evaluate their teaching faculty. As yet, little work has 
been done to develop measures of preschool or early school pupil reactions to teachers. 
Strickland's (1970) report of explorations with a school attitude questionnaire for 
young children is a notable exception. No attempt will be made here to review 
the vast literature of student evaluation of teaching. Sources of information about 
measurement in this area include Davldoff (1970), Evans (1969), Hayes (1968), 
Hoyt (1969), Justiz (1969), Lewis (1966), McKeachie (1969) and Paraskevopoulus (1968). 

Parents, understandably, are not much tested in connection with early child- 
hood education progr&ms. When they are, it is usually in the form of scales to 
measure attitudes toward child rearing practices and education or perceptions of 
themselves and their children in relation to training objectives (e.g., IPLET, 1969). 
The measurement of parent attitudes has a long history which has involved the develop- 
ment of a variety of scales for research use, some of which conceivably could be put 
to good use by educators (Baumrlnd, 1967; 

ERIC 
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Lorr and Jenkins, 1963; Schaefer and Bell, 195^). Yet most measures of this 
kind are beset xcLth problems of both validity and reliability. Caution in their 
application is therefore warranted. Measurement procedures that have been used 
in home teaching and parent involvement projects are discussed by Kemble (1969)f 
Orhan and Radin (I969) and VJeikart and Lambie (1968), Finally, a scale designed 
to assess parent attitude change in relation to community action programs has 
been devised by Hanson, Stern, and Kitana (1968), 

Teacher-Kade Tests , This category of measurements includes short answer, 
obJectively^scoredTtests, essay and written documents, and many pupil products 
(©•g»» art x-rork, written materials, constinictions, and various classroom pro- 
jects), l ost readers are familiar with such measures. Their nature, constiruc- 
tion, and use are described in any basic textbook about educational measurements. 
Comments about teacher-made tests are therefore limited here to three incidental 
points , 

First, behavioral objectives in any program of instruction in effect can 
themselves become measurements of the criterion-reference type. That is, if 
one describes (l) an individual's behavior that is to be performed together with 
(2) the context conditions of performance specifically enough so that the behav- 
ior can (3) be recognized when it occurs, then one's measurement task is straight- 
forward: Observe and record the behavior. However, it is usually necessary 
also to specify a desired minimal level of performance^ (I'ager, I962), Such a 
stiggestion is especially appropriate for those teachers who design their instruc- 
tional programs around a mastery concept of achievement (See Block, .1971), 

Second, in this writer's judgment, the potential of pupil products for 
measuring developmental progress, including academics, is frequently underesti- 
mated. However, there is some indication'^tKat a pupil product orientation is 
preferred even to conventional testing hy many educators, especially those who 
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identify with "open education."-^ For example, such educators maintain that the 
"best measure of a child's work is his work" (Barth, 1969)» Any meaningful 
application of this principle obviously requires that careful records of chil- 
dren's work be kept. An analysis of the cximlative change in children's work on 
a longitudinal basis is also necessary. Admittedly, relatively "informal" mea- 
surements are extremely limited for research purposes. But children's concep- 
tual functioning, problem solving skills, and aesthetic expression can all be 
revealed in unique ways by activities that result in pupil products of various 
kinds. 

Third, and finally, there is a great need for improvement in the test-making 
skills of teachers in early childhood education programs. Too often, teacher 
education programs require no course at all or require only a general course 
about tests and measurements in which descriptive statistics and item vn'iting 
are stressed. Since the principal focus of such courses usually is on tests 
that require literacy, prospective early childhood personnel often see them as 
irrelevant. In short, more attention is needed to the development of skills in 
constructing checklists, tests of sensory discrimination and vocabulary, proce- 
dures for evaluating pupil products, and possibly even interview measurements 
among prospective preprimary teachers. Combined lack of skill in measurement 
technique and lack of understanding of how measurements can be used to facilitate 
instruction may also explain why teachers of young children often fail to incor- 
porate a measurement perspective into their educational programs. 

Fortunately, useful resources specifically concerned with the const37uction 
of informal measurement procedures are beginning to appear. For example, . 

3 Ostensibly, the open education movement represents more a commitment to the 
process of learning* including the enhancement of cognitive processes. Ulti- 
mately, however, some evidence of process — as reflected in the child's behav- 
ior (a product of some kind) — is necessary for evidence of process. 
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assessment tools for teachers of pre-primary children who have various language 
and learni^ disorders are described by Bangs (1968). Bangs also provides curri- 
culum suggestions once assessments are made. Another resource devoted to informal 
educational measurement has much to recommend it (Smith, 1969). This author deals 
with the areas^of perceptual-motor development, reading and arithmetic skills, 
handwriting and spelling, speech and language disorders, and personal- social 
behavior. 

Other Measurement Techniques . Thus far nothing has been said about projec- 
tive measures, the measurement of social relations ( sociometry) , creativity mea- 
surement, and the medical approach to behavior measurement, including biological 
structure and function.^ In a pragmatic sense, there is good reason for these 
''oversights.^* Projective measures of personality," for example, are rarely used 
outside the clinical settinjg. Even many clinicians seemingly have become disen- 
chanted with projective techniques because of their low validity, i oreover, 
teachers are not trained to administer and interpret projective measures; nor, 
in this writer's estimation, should they be. Readers interested in the use of 
projective techniques with children are referred to Levine (1966) and Blum (I968). 

Sociometry has made a unique contribution to our understanding of social 
phenomena such as popularity and friendship, peer acceptance and rejection, 
leadership and influence power, group roles, and the relationship of sociability 
to school achievement. Insights concerning these phenomena have come mainly 
from the study of children beyond the preprimary level. But the successful use 
of sociometrics with nursery and kindergarten children has been reported (e.g., 
Northway, 1969a; 1969b; Hartup, 1970). Even so, sociometrics seem largely to be 



Iieasurement considerations associated with still another concept, cost-benefit 
analysis , ^ire not dealt xdith in this paper. For an introduction to this 
approach see Alkin (1970). 
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utilized by child development research workers, not teachers and psychological 
specialists in the public schools. There are probably a number of reasons for 
this. One is lack of knowledge about and skill in using sociometric devices on 
the part of teachers. Another is that early childhood programs, while ostensibly 
devoted to promoting children's social development, infrequently reflect specific 
goals that call for systematic measurement in this area. Still another is the 
occasional ethical objection raised in connection with sociometric s, that is, a 
reluctance to "meddle" in children's social lives. The irony of this should be 
self-evident. Regardless, in this writer's estimation, the potential of socio- 
metrics for gaining a better understanding of children's social perceptions, com- 
petence, and acceptance has not been miuch capitalised upon by educators. Again, 
interested readers are referred to other sources for a more comprehensive treat- 
ment of sociometric theory and technique (Gronlund, 1959; Northway and Weld, 
1957)- 

Creativity is a much discussed, but little understood characteristic of 
human behavior* Wot surprisingly, most empirical approaches to the measurement 
of creativity lack both a consensus about the behavior being measured and tech- 
nical refinement (Tryk, I968). Despite tho measurement controversy, one cannot 
help being impressed with the vast amount of creativity research that has accum- 
ulated in recent years. ^ Tost such research has been with older children, youth, 
and adults. I' ore recently, hox-^rever, considerable effort has been deployed to 
measure young children's creative potential under experimental conditions. For 
example J behavioral tasks deemed relevant to such potential have been devised 1:^ 
Starkweather (1966). These tasks are purported to measure psychological freedom, 

5 For a recent and comprehensive review of creativity research see VJallach 
(1970). 
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willingness to try difficult tasks, curiosity, and originality. Ideational fluency, 
rate, uniqueness, expressive freedom, productivity, and communicability are criteria 
variously stressed in still other recent attempts to measure creativity in early 
childhood (Ward, 1969a; ward, 1969b; Gross and Marsh, 1970; Singer and Whiton, 1971) 
Finally, it is claimed that certain portions of the Torrance Tests of Creative Think 
ing are appropriate for children as young as age four (Torrance, 1966). For an 
overview of problems involved in measuring young children's creativity, see 
Starkweather, 1964). 

~ Early - childhood personnel concerned with the assessment of physical growth, 
physiological functioning, sensory awareness, and the like should consult the 
following sources for pertinent surveys of measurement technique and research 
methodology: Eichorn (1970), Kaye (1970), Macy and Kelly (1960), Meredith (1960), 
Reisen (1960), and Tanner (1970). Diverse approaches to the measurement of tempera- 
ment persistence, curiosity, impulse control, and reflectivity also merit the atten- 
tion of educators. Not only are these measurable characteristics related to young 
children's school achievement, but they possibly can aid as indicators of affective 
development (Banta, 1970; Kagan, 1965; Maccoby et al, 1965; Maw and Maw, 1970; 
Thomas, Chess, and Birch, 1971). Finally, the Illinois Test of Psycholinguistic 
Abilities (1970) should be noted for its use as a frame of reference in diagnostic- 
prescriptive teaching . 

The When of Measurement 

Thus far, the what and how of measurement have occupied most of our attention. 
The question of when to measure theoretically should be answered according to one's 
evaluation plan. In the previous chapter, much was said about formative and 
summative evaluation. Both types of evaluation obviously call for measurement of 
one sort or another. There is little reason to elaborate further on this matter 
except a reminder that, in educational practice, most analyses require that 
measurements be taken at least two points in time. Too often, measurement occurs 
only in tka context of summative evaluation with little attempt to assess entering 
or baseline behavior. 

The issue of immediate and long-term measurement of behavior also is applicable 
to the when question. As far ais most training programs are concerned, measurement 
is usually limited to immediate outcomes--perf ormance during and/or 
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at the ?md of a given program (Webb, 1970), This emphasis is usually accom- 
panied by the tacit assumption that performance at such times is an appropriate, 
if not good, indicator of how veil one will perform over the longer haul* If 
the skills being learned during teacher-training, for example, are directly 
linked to eventual classroom teaching performance, then the assumption is valid. 
Otherwise, the measurement of performance during training may only be remotely 
related to measurement of on-the-job proficiency. 

In the case of both teachers in training and children in early education 
programs, a basic question is whether short-term measurements involve anything 
more than behavior developed over the short term. Related to this is the prob- 
lem of determining how soon one can expect program effects to be realized; and 
to this problem can be added that of determining viith certainty how stable and 
durable are the behavior changes brought about by programs (Care, 1971). The 
ideal procedure to follow would involve repeated, if not continuous, measurement 
of program "output variables" (Caro, 1971)- Unfortiinately, one-time measure- 
ment in connection ^d-th immediate-term summative evaluation is probably more the 
rule than the exception in actual practice. However, systematic follow-up mea- 
surement is a sound and potentially enlightening policy, for both teacher e(^nc&^ 
tors and teachers of young children. 

SELECTING AHD EVALUATING I-EASUHEM^NT TECHJIIQUES 

In the earlier discussion of cidterion-, and noinn-roferenced measurement, 
the point was made that decisions about measurement strategy should be made on 
the basis of what decisions will be made from the measurements obtained. This 
decision-making process will necessitate a careful consideration of program objec- 
tives. The role of objectives is also instrumental to the selection of measurement 
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techniques. Broadly speaking, there are at least two ways in which this selec- 
tion can be made. First, measurement techniques can be determined in advance 
directly from specific program objectives. If such objectives are unique, then 
measurement techniques often must be designed locally. That is, existing tests, 
scales, or other techniques simply may not be valid or otherwise suitable for 
the measurement of one's instructional objectives. 

Second, one can approach this selection problem only with broad program 
goals in mind, then select from- "off the shelf some existing measurement tech- 
nique(s) for his purposes. If so, the measurements obtained become either (1) 
ipso facto instructional objectives and/or (2) a sample of behavior that may 
be spuriously related to the content and objectives of one's curriculum* 

The first procedtire is generally preferable to the second, simply because 
it demands clear, systematic thinking about the exact purposes of a given instruc- 
tional program. However, as suggested earlier in the chapter, measurement need 
not be limited to a narrow set of instructional objectives. One is often inter- 
ested in determining change or outcomes for which specific curriculum exper- 
iences were not arranged • For example, a teacher might be specifically concerned 
with measuring pupil progress in language skill development brought about through 
pattern drill. But this same teacher may also wish to obtain a measure of the 
extent to which children become more or less anxious about school during language 
instruction, even though the curriculum per se has no formal provision for modi- 
fying anxiety level. In fact, this notion is reflected in the thinking of others 
(e,g,, Caro, 1971) who recommend that any possible \mintended program effects, 
including those which are undesirable, should be both anticipated and measured. 

Regardless, the issue of selecting measurement techniques in relation to 
program objectives is essentially a matter of validity: Will a given technique 
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measure what I want to measure? ^^Fhile validity is unquestionably the most cri- 
tical aspect of any measurement endeavor, other qualities of measurement tj9ch- 
niques are also important. Unless a given technique can measure behavior reliably 
or consistently from one time to the next, the interpretation of behavior changes 
in relation to an instructional program is next to impossible. Even the most 
valid, reliable measures may not be feasible for use in some programs due to 
complexities In administration, scoring, cost, or other factors. Thus, practi- 
cality is st^ll another important consideration. F^inally, in the case of norm- 
referenced measurements, the quality of norming procedures clearly is cinicial, 
especially when important decisions about individuals are to be made. In short, 
the criteria of validity, reliability, and practicality are basic to any process 
of selecting and evaluating measurement techniques. Readers who msh more infor- 
mation about these criteria and their application to technique selection and 
evaluation tasks are referred to Brown (1970), Cronbach (1970), and the Pmerxcan 
Psychological Association publication concerning test standards (19^6), 



Despite obvious values, the process of mea'^urement isNjiarked by many prob- 
lems and issues that demand the attention of educators. These problems and 
issues can be grouped into three related categories: (l) The problem of academic 
or cultural bias, (2) the general impact of testing on students, and (3) how test 
results are used (Bloom, I969) (Goslin, I968). In this section, these related 
groupings of problems and issues in measurement are examined. In this way, the 
necessity for an ethical approach to educational measurement can be better clari- 
fied» 



SPECIAL CONSIDERATIONS IN KEASUREI-ENT 
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Academic or Cultural Bias 

T ost tests that are used with children of late preschool age and beyond 
already contain "academic" coinponents which require specific language, di scrim- 
ination, and conceptual skills (space, number, time) for si;ccessful performance. 
These tests also frequently call for various amounts of scholastic information 
(Stephens and Evans, in press). For purposes of assessing achievement status 
and predicting subsequent scholastic success, this academic orientation is suit- 
able if two eonditionf? can be met. First, a child must have encoiontered the 
opportunity to become familiar with the test-related content in a general way. 
Second, a child mrust have at least a minimal repertoire of test-taking skills, 
l^'^any young children obviously can meet neither condition. Therefore, a test 
with a strong academic flavor may be a very poor sample of such children's past 
learning . 

Admittedly, a tester can do little about a child's limited familiarity 
'Xd.th culturally- or academically-based test content except, of course, temper 
his interpretation of a child's performance accoi^dingly and avoid mis-using the 
test results. In other words, the problem is not so much a matter of tests per 
se « Rather, it is more a problem of the test user. 

The problem of test- taking skills, however, is a matter about which a 
tester can do something more concrete. Personnel involved in test administra- 
tion should provide for children experiences that simulate test conditions prior 
to the formal test itself. This includes provisions for practice in following 
directions, handling^test materials, and the like. A "warm-up" period prior to 
testing is also advisable. An example of this policy in action can be cited. 
Oakland and IJeilart (1971) implemented special activities for disadvantaged 
ohildren to develop their familiarity xdith tests and skills necessary for test 
O 
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performance. These activities had an initially strong, positive effect on the stand- 
ized test performance of these children as compared to peers who received no special 
training. 

Again, the problem of academic or cultural bias is basically one of test validity. 
This problem of test validity often is most apparent when tests are given to make 
decisions about children whose performance may be affected negatively by factors 
such as low reading ability, low test- taking motivation, lack of familiarity with 
the language and conceptual style of a test, negative attitudes toward school per- 
sonnel and academics, and poorly designed tests formats or instructions (Freeburg, 
1970). These factors frequently are noted when conventional tests are used with 
children from impoverished circumstances or certain ethnic and minority groups. 

The argument that tests based on standard English discriminate negatively and 
unfairly against children whose native language or dialect differs from the standard 
form has recently gained much support. This argument underlies, in part, the de- 
velopment of experimental measures of language competence and translations of 
conventional tests (e.g., the Stanford Binet Intelligence Scale) into non-standard 
English forms. As far as variant English dialects are concerned, the overall 
results of such efforts are mixed. For example, some authorities (e.g. , Baratz, 
1969) report salutary effects when tests are administered and scored in terms of 
the child's native dialect. This is notably true in the case of measuring native 
dialect proficiency. Similarly, others (e.g., Garvey and McFarlane, 1970) have traced 
variations in standard English proficiency among black children from lower socio- 
economic homes to interference from their normal language pattern, rather than 
academic ability differences, 

In contrast, still other authorities (e.g., Quay, 1971) report that Intelligence 
test performance among black children is little affected by the language of test ad- 
ministration, i.e., whether the test is given in standard English or black dialect; 
and, no reliable performance differential has been observed among black children — 
to whom a test of echoic responding was administered in either standard English 
or black dialect (Stern and Gupta, 1970). 

Of course, there are at least two basic problems inherent in studies designed to 
compare children's performance on different linguistic forms of the same test. One 
is that tests translated from standard English to a second language (e.g., Spanish 
or French) may lose some validity in the very process of translation — to say nothing 
of the questionable practice of using norms from the original version of the test in 
order to describe performance on the translated version. A second problem is that no 
one English dialect is likely to characterize all children from different ethnic or 
^ racial minorities. Thus, comparisons of standard English with black dialect, for 

example, may be gross and misleading. 
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Clearly, extensive research is needed to clarify the influence of dialect 
on test-taking behavior. But certainly tests based on standard English are 
unsuitable for use with any child who nei>-.her understands iior can speak the 
language of the test. This means that a teacher must first ensure that a child 
is sufficiently conpetent in standard English if such coinpetence is necessary 
for a valid test performance. Otherwise, alternative xnethods of ineasurenent 
must be sought in order to obtain any reasonably accurate portrait of a child's 
achievement status. In short, the practice of early classifjdng or othenriLse 
evaluating xd.th standard English measures the intellectual competence of chil- 
dren whose native language is different from English cannot be condoned. For 
an example of early intervention research in which a bilingual approach to mea- 
surement is illustrated, see Nedler and Sebere (1971) s 

General Impact of Testing on Siaidents 

As noted earlier in this chapter, a given testing program may have as its 
purpose nothing more than obtaining useful information. However limited the 
intended purposes of a testing program, the effect of testing ia.ll doubtlessly 
extend beyond this point (Stephens and Evans, in press). Simply taking a test, 
or expecting one for that matter, is bound to have various effects on most chil- 
dren (Goodwin, I966). Unfortunately, controlled research in this area of psy- 
chology has been meagre. But reasoned speculation combined with what little 
research has been done in this area can lead to the identification of some def- 
inite possibilities. For example, testing effects may occur in advance of actual 
testing by influencing the type and degree of preparation in which students 
engage themselves. In some cases, teachers (and parents) may even coach their 
charges both in the tactics of test-taking and the content of anticipated test 
situations • 



Effects may also be realized duririR: a test itself. For example, testing 
can act as a form of teaching (Stephens and Evans, in press). In taking a test 
• early in an educational program, students may learn something about what they 
will be studying (or evaluated on later) and also become sensitized to the mater- 
ial that will be stressed. Students may thus be led to pay more attention to 
this material when it is encountered (Entwhistle, I96I). 

^'fhile this may often be desirable from a teacher's point of view (parti- 
cularly in the case of criterioi^-referenced measuroment) , some disadvantages 
may occasionally occur from testing. Some students, for example, in advancinpr 
an erroneous ansxirer, may become more committed to that answer. Subsequently, 
they may encounter some difficulty in overcoming the misconception or inaccuracy 
that their answer represents (Stephens and Evans, in press). Finally, every 
precaution should be taken to avoid convejring the message to students that test- 
taking and strong test performance are the end -alls of education. . 

As Bloom (1969) suggests, the psychological effects on students during an 
actual examination may be comparatively light, except for such possibilities as 
excessive anxiety or emotional stress, frustration, self-doubt, or feeling of 
failure (or accompli shment t ) that may be associated with a test situation. Fur- 
ther, while fatigue at the end of an extended examination may occur among some 
children, it is likely that this effect is usually mild and short-lived. 

But perhaps the most potentially serious outcome in this regain can come 
from "overt est inf?" children in connection with early intervention programs. It 
has been this writer's experience that some evaluators simply schedule too much 
formal testing during the course of an early education program. Conceivably, 
the net adverse effect of this problem could be at least threefold. First, 
children may come to view testing negatively because it can m.ean unwelcome intru- 
sions and undue pressures for performance. Second, teacher resentment over 
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interruptions of their instructional, program can accumulate to the point where 
their cooperation with testers may suffer. Third, overtesting is subject to 
the law of diTninishing returns: That is, the time and expense involved in exten- 
sive testing may not he worth the additional results obtained. It is also pos- 
sible that *'too much" data can serve to complicate p rather than facilitate, the 
interpretation of program outcomes.* This concern over excessive testing was 
also discussed in the first evaluation chapter. 

There is no universal rule for determining how many of what kind of tests 
one should utilise in an evaluation plan. Nevertheless, a judicious perspective 
on this issue is imperative. 

It has been suggested that temporary fatigue or inconvenience occasionally 
associated with testing are comparatively minor problems. However, prolonged 
anxiety that may be engendered by testing and evaluation procedures is not, 
-Among other things, the literature concerned with test anxiety indicates that 
teachers (testers) should avoid adding emotionalism to testing procedures by 
dramatizing the hazards of doing poorly or the idea that a student's future is 
at stake in the testing situation. It is therefore extremely important to 
acknowledge the potentially debilitating effects that intense emotion may have 
on an individual's test performance , particularly where complex intellectual 
tasks are involved. There is good evidence to indicate that individual differ- 
ences in test anxiety are apparent among children as early as kindergarten 
entrance and that subsequent changes in measured anxiety level are linked to 
patterns of change in achievement and intelligence test performance throughout 
the elementary school period; in general, it is the highly an:cious child or 
youth that stands to suffer the most in this regard (Sarason et al, I96O) 
(Sarason,| Hill and Zimbardo, 1964), 
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Thus f^Tf a few ideas concerning the impact of testing on students before 
and during examination periods have been presented. v.Tiile these dimensions of 
the problem are important, postexamination effects possibly are the most pro- 
found, depending upon x<rhat uses are made of test results. As Bloom (I969) has 
observed, these effects may be minimal or maximal, positive or negative, but 
they can neither be completely controlled nor entirely neutralised • 

Post-testing effects are in part a function of the type of test utilized 
and the way in which tests are utilized. For example, there are at least 
three categories of tests that carry a high potential for lasting effects: 
(1) Tests designed to meai=ure significant and relatively stable human qualities, 
such as tests of intelligence and aptitudes; (2) tests that are used to facili- 
tate major educational decisions for example, tests for admission to cer- 
tain academic programs ^ certification of satisfactory completion of an educa- 
tional program, and the like; and (3) tests whose results become a permanent 
part of a student's record or that are made public for one reason or another 
(Bloom, 1969)* Extreme care must therefore be exercised in regai*d to the selec- 
tion, administration, and interpretation of tests used for such purposes. This 
le-.ds to an exr^licit ethical consideration of the way in i^ich test results may 
be used. 

The Ethical Use of Test Results 

Concern about the possible misuses of test results is represented by an 
extensive literature (e.g.. Black, I963; Dyer, 1961; Ebel, I96^j Hoffman, 1962; 
Mehrens and Lehmann, I969). Among the more critical potential misuses of tests 
that are discussed in this literature are four summarized by Ebel (196^). These 
misuses can serve both to illustrate the current and widespread professional con- 
cern about tests and suggest to the reader some guidelines for his own policy 
formulation about tests. 
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First, it is conceivable that imprudent educational testing can indelibly 
mark a student's intellectual status as superior, average,; or inferior.- If so, 
his subsequent academic or social status could be more or less predetermined by 
way of expectancies that become established among those privy to test results 
and decisions about educational programming that come about through testing # 
This need not necessarily be destructive to the individual. But an individual 
who is "assigned** a label of "weak student" on the basis of, say, an intelligence 
test score, may be adversely affected both in self-esteem and motivation for 
future achievement. 

Second, it is possible thet certain testing practices can generate a 
restricted concept of human abilities, one based largely on degrees of success 
in intellectual achievement situations. Consequently, this sort of concept may 
lead to a focus upon the attainment of limited goals, often at the expense of 
educational practices that are designed to facilitate the development of diverse 
talent, 

A third possible misuse of test results concerns the exercise of Kachie- 
vellian tendencies among those in charge of testing programs. By this is meant 
the exercise of excessive and unwarranted control over the personal destinies 
of children. 

Finally, poorly conceived testing practices may foster rigid, mechanistic, 
and depersonalized approaches to measurement and evaluation that, in effect, 
could limit basic h'uman rights and impede positive human relations within the 
schools. 

Such distasteful outcomes are not inevitable. However, it is clear that 
steps must be taken to guarantee that these outcomes do not materialize. Tests 
should be vlej^ed as but one of several means for increasing student achievement 
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by way of motivating and directing the energies of students and teachers alike. 
Furthermore, the use of tests to impose upon others certain decisions and courses 
of action should be deferred as much as possible to their use in providing data for 
choice in individual decision making (Ebel, 1964). As previously indicated, the 
issue therefore concerns more the way tests are used than it does the nature of tests 
per se . 

This writer has often been impressed by the negative emotionalism associated 
with tests and their use by many parents, college students, and teachers. Fre- 
quently this emotionalism leads to exaggerated claims about the inhumane or even 
subversive nature of testing, particularly Intelligence and personality testing. 
Such emotionalism undoubtedly is kindled by inadequate understanding of tests and 
their uses. But, instances of unwise test use in the school can provide Justification 
for much criticism. At the extreme, tests may be confiscated and burned by opponents 
of psychometrics (Nettles, 1959; Eron and Walden, 1961), Less extreme, but in- 
dicative of ^resistance, is refusal by parents to submit their children to testing in 
the school seating. Still another area of conflict for educators is the matter of 
when, by what means, and how extensively parents should be informed about their 
children's test performances. As a matter of course, professionals who are respon- 
sible for specific programs of childhood education and research must develop a 
judicious policy in relation to these problems. At the very least, this policy 
should Include advance parental permission for testing and an acceptable method for 
cotnmunicatlng the purposes and outcomes to concerned parents. This recommendation 
of course, is based on the assumption that parents have the right of access to 
their children's school records. 
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Dnfortunately, virtually no empirical data exist concerning the effects on parents 
of receiving Infoxmation about their children's test performances (Kirkland, 1971). 
Research In this area is sorely needed in order to glean better clues for policy 
formulatLon about school testing practices,^ 

It is also pertinent to consider the social consequences of abandoning tests. 
In a broad sense, the case for no testing is much the same as that advanced against 



evaluation in general 




It is unnecessary to elaborate in great 



detail on this issue. However, the following quote summarizes well a reasoned 

position on this matter: 

*'If the educational tests were abandoned, the encouragement and reward of 
individual efforts, would be made more difficult. Excellence in programs 
of education would become less tangible as a goal and less demonstrable 
as an attainment. Educational opportunities would be extended less on 
the basis of aptitude and merit and more on the basis of ancestry and 
Influence; social class barriers would become less permeable. Decisions 
on important Issues of curriculum and method would be made less on the 
basis of solid evidence and more on the basis of prejudice or caprice. 
These, •..are likely to be the more harmful consequences, by far. Let xxb 
not forego the wise use of tests." (Ebel, 1964, p. 334). 

Implicit in this passage is the notion that problems of interpreting and using 

test results may occur largely because of certain lacks — for example, lack of knowl- 
edge about the limitations of tests and the technical and theoretical aspects of 
testing. Yet if test results are to be useful they must be communicated to those 
directly responsible for students, singly or in groups (Levine, 1966). Again, 
the task here is one of establishing sound policy for communication, a policy that 
includes safeguard-^^or student welfare. 



This section of theN^l^apter can be concluded with a reference to a position 
statement on psychological assessment recently adepted by the American Psychological 
Association (1970). This statement in fact represents a policy for 




6 Readers interested in a further study of testing policy are referred to Vol. 20, 
No. 11 of the American Psychologist (1965)* The entire issue is devoted to issues 
and ethics associated with testing and public policy. i 
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testing and the use of test results formulated in relation to the essential 
features of psychological assessment. As such, it is an extremely important 
policy for consideration by school personnel everywhere « 

1, Guaranteed protection must be provided for every individual 
against unwarranted inferences by educational personnel ill-equipped 
with necessary backgro\md knowledge and skill in testing, 

2, Obsolete information (dated test results and the like) that 
might lead to unfavorable evaluation of an individual must be periodi- 

s 

cally culled from personal records in order to protect that individual. 

3, Unnecessary intrusions into one's privacy must be avoided; 
irrelevant tests and questions have no place in a well-designed assess- 
ment program, 

^, Given the above modes of protection, procedures should be 
established to facilitate continual investigation of new and improved 
techniques of assessment. 

While these guidelines are pertinent to ability testing, they are perhaps 
even more applicable to personality assessment. In either case, the key con- 
cept again is relevance. That is, measurement procedures should have demon- 
strable relevance to the peculiar purposes of evaluation — whether one wishes 
to evaluate academic competence, instructional effectiveness, or personal-social 
development. However, one must simultaneously determine relevance in relation 
to an ethical framework and criteria of social acceptability. 

SUMMARY AND CONCLUSIONS 

This essay represents a selective overview of contemporary measurement 
practices in early childhood education. Althovigh techniques for the measurement 
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of yoting children 'ij behavior received primary attention, coirrnent was extended 
to procedures for measuring both teacher and parent behavior* 

A number of emergent trends concerning current measiirement practice were 
cited. Those with perhaps the deepest implications for early education reflect 
attempts to broaden the spectrum of measurement in several directions, including 
a greater sensitivity to cultural or ethnic interests and a stronger focus upon 
children's affective development. However, much technical work remains to be 
done before field application of measures relevant to these foci can confidently 
be made, 

T'easuretnent was considered along three basic lines of thought: what, how, 
and when. An answer to the question of what to measure requires a careful 
scrutiny of educational objectives. In turn, the specification of objectives 
is instrumental in detennining an answer to the how of measurement. Achieveme'nt 
of a consistency among program objectives, instructional content, and measurement 
procedures is imperative for sound curriculum evaluation. 

It is clear that interviews, systematic observational procedures, and 
tests (standardized and otherwise) continue to dominate educational practice. 
However, many creative variations of these techniques have recently appeared. 
Moreover, a re-birth of masteiy approaches to learning is apparent from the 
viidespread interest in criterion-referenced measurement. Measurement is also 
being applied increasingly to variables other than strict behavioral outcomes 
of educational programs, 'ihese variables include curriculum components, curri- 
culum materials, teacher-child interaction, and the physical environment for 
education. Among other things, this means that comprehensive measurement will 
include the input and process phases of instruction, as well as the traditional 
output phase. A precise conceptualization of basic input, process, and output 
variables can also serve better to frame the question of when to measure. 
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Finally, the research literature reviewed herein suggests that direct behav- 
ioral measures are generally preferable to those that only provide data for infer- 
ences about hypothetical constructs, primarily for reasons of validity. Validity 
is also a central issue in the special problems involved when many conventional 
measures are used with individuals from varying cultural. backgrounds. Cultural 
bias, along with the possible psychological effects of testing on children and 
the ethical use of test results, wore identified as phenomena that warrant the 
attention of teachers and research workers alike. 
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